Cooley - Tukey FFT on the Connection MachineS
نویسندگان
چکیده
We describe an implementation of the Cooley Tukey complex-to-complex FFT on the Connection Machine. The implementation is designed to make eeective use of the communications bandwidth of the architecture, its memory bandwidth, and storage with precomputed twiddle factors. The peak data motion rate that is achieved for the interprocessor communication stages is in excess of 7 Gbytes/s for a Connection Machine system CM-200 with 2048 oating-point processors. The peak rate of FFT computations local to a processor is 12.9 GGops/s in 32-bit precision, and 10.7 GGops/s in 64-bit precision. The same FFT routine is used to perform both one-and multi-dimensional FFT without any explicit data rearrangement. The peak performance for a one-dimensional FFT on data distributed over all processors is 5.4 GGops/s in 32-bit precision and 3.2 GGops/s in 64-bit precision. The peak performance for square, two-dimensional transforms, is 3.1 GGops/s in 32-bit precision, and for cubic, three dimensional transforms, the peak is 2.0 GGops/s in 64-bit precision. Certain oblong shapes yield better performance. The number of twiddle factors stored in each processor is P 2N + log 2 N for an FFT on P complex points uniformly distributed among N processors. To achieve this level of storage eeciency we show that a decimation-in-time FFT is required for normal order input, and a decimation-in-frequency FFT is required for bit-reversed input order.
منابع مشابه
Cooley - Tukey FFT on the Connection
We describe an implementation of the Cooley Tukey complex-to-complex FFT on the Connection Machine. The implementation is designed to make e ective use of the communications bandwidth of the architecture, its memory bandwidth, and storage with precomputed twiddle factors. The peak data motion rate that is achieved for the interprocessor communication stages is in excess of 7 Gbytes/s for a Conn...
متن کاملDifferent Approaches for OFDM Transmitter and Receiver Design in Hardware FPGA Design and Implementation with Performance Comparison
In this paper we investigate performances (bandwidth and speed) that could be obtained using basic logic structures for implementing improved Cooley-Tukey algorithm for IFFT/FFT in transmitter/receiver, as well the approach for implementing OFDM processing unit based on optimized ROM implementation. We implemented these techniques for OFDM in Virtex5 and Viterx7 FPGA boards, and we analyzed the...
متن کاملPaired Faster Fft: Grigoryan Fft Implementation and Performance on Xilinx Fpgas and Tms Dsps
DOI: 10.5281/zenodo.55536 ABSTRACT Discrete Fourier Transform is a principal mathematical method for the frequency analysis and has wide applications in Engineering and Sciences. Because the DFT is so ubiquitous, fast methods for computing DFT have been studied extensively, and continuous to be an active research. The way of splitting the DFT gives out various fast algorithms. In this paper, we...
متن کاملCooley-Tukey FFT like algorithms for the DCT
The Cooley-Tukey FFT algorithm decomposes a discrete Fourier transform (DFT) of size n = km into smaller DFTs of size k and m. In this paper we present a theorem that decomposes a polynomial transform into smaller polynomial transforms, and show that the FFT is obtained as a special case. Then we use this theorem to derive a new class of recursive algorithms for the discrete cosine transforms (...
متن کاملDFT and FFT: An Algebraic View
In infinite, or non-periodic, discrete-time signal processing, there is a strong connection between the z-transform, Laurent series, convolution, and the discrete-time Fourier transform (DTFT) [10]. As one may expect, a similar connection exists for the DFT but bears surprises. Namely, it turns out that the proper framework for the DFT requires modulo operations of polynomials, which means work...
متن کامل